10 research outputs found

    Reviewing the connection between speech and obstructive sleep apnea

    Full text link
    The electronic version of this article is the complete one and can be found online at: http://link.springer.com/article/10.1186/s12938-016-0138-5Background: Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. Methods: A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea–hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients’ condition. We first evaluate AHI prediction using state-of-theart speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. Results: The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. Conclusion: The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for diagnostic applications. We have found two common limitations that can explain the likelihood of false discovery in previous research: (1) the use of prediction models derived from sources, such as speech, which are also correlated with other patient characteristics (age, height, sex,…) that act as confounding factors; and (2) overfitting of feature selection and validation methods when working with a high number of variables compared to the number of cases. We hope this study could not only be a useful example of relevant issues when using machine learning for medical diagnosis, but it will also help in guiding further research on the connection between speech and OSA.Authors thank to Sonia Martinez Diaz for her effort in collecting the OSA database that is used in this study. This research was partly supported by the Ministry of Economy and Competitiveness of Spain and the European Union (FEDER) under project "CMC-V2", TEC2012-37585-C02

    Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment

    Full text link
    Obstructive sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients' facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition), over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets). Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs). Support vector regression (SVR) is applied on facial features and i-vectors to estimate the AHI.The activities in this paper were funded by the Spanish Ministry of Economy and Competitiveness and the European Union (FEDER) as part of the TEC2012-37585-C02 (CMC-V2) project. Authors also thank Sonia Martinez Diaz for her effort in collecting the OSA database that is used in this study

    Study of speech and craniofacial features in obstructive sleep apnea patients

    Full text link
    ESTA TESIS EXPLORA LA CARACTERIZACIÓN DE LA VOZ Y DEL FENOTIPO CRANEOFACIAL de pacientes diagnosticados con el síndrome de Apnea Hipopneas durante el Sueño (SAHS) mediante técnicas del estado del arte de tecnologías de caracterización de voz y procesado de imágenes para la caracterización de caras, así como el estudio y análisis de modelos supervisados de aprendizaje automático para la evaluación de las características de la voz y de las características craneofaciales como predictores del SAHS. El síndrome de Apnea Hipopneas durante el Sueño (SAHS) es un tipo de trastorno respiratorio que afecta principalmente a los hombres de edad adulta, y se caracteriza por pausas recurrentes de la respiración durante el sueño debido al bloqueo total o parcial de las vías aéreas superiores. El diagnóstico del SAHS se realiza mediante la polisomnografía convencional. Para esta prueba es necesario que el paciente duerma en la unidad del sueño del hospital bajo la supervisión de un equipo médico con el propósito de registrar los patrones de respiración, ritmo cardiaco, y movimiento de las extremidades. Sin embargo, el procedimiento para el diagnóstico del SAHS es muy costoso debido al requerimiento de equipos y personal necesarios para la prueba, e invasivo para el paciente. Además, la lista de espera para el diagnóstico puede llegar a ser de un año. Se han desarrollado muchos métodos como alternativa a esta problemática con el fin de reducir las listas de espera y acelerar la detección de los casos más severos. Entre estas pruebas se encuentran aquellas basadas en la detección de síntomas relacionados con el SAHS mediante cuestionarios realizados al paciente; inspección visual de la zona orofaríngea mediante el test de Mallampati, y análisis craneofacial mediante técnicas avanzadas de representación de imágenes tales como cefalometría, tomografía computarizada y análisis de imágenes por resonancia magnética. Entre las ventajas que pueden ofrecer estos métodos destacan la detección rápida de casos positivos y la priorización de casos severos de SAHS, así como su carácter menos invasivo en comparación con la polisomnografía convencional. No obstante, muchos de estos métodos son costosos y no son suficientemente generalizables Los primeros estudios para la evaluación del SAHS mediante técnicas de análisis de imágenes y caracterización antropométrica hallaron características anómalas en las estructuras de las vías aéreas de pacientes con SAHS. Por tanto, se pueden esperar patrones anómalos en el habla de los pacientes debido a la presencia de anormalidades en las estructuras o funciones de sus vías respiratorias. Esta hipótesis fue confirmada por los primeros estudios basados en análisis acústico de grabaciones de pacientes diagnosticados con SASH. Los antecedentes expuestos anteriormente han llevado a proponer procedimientos menos costosos basados en el análisis de caras y grabaciones de voz de los pacientes para ayudar a la detección del SAHS, así como a la evaluación de su severidad. Por tanto, esta Tesis explora la caracterización del habla y el fenotipo craneofacial de pacientes diagnosticados con SAHS mediante técnicas de reconocimiento automático de locutor (i-vectors, supervectors) y caracterización de caras (características locales, modelado estadístico, características basadas en redes profundas). Para las pruebas se empleó una base de datos de 729 pacientes (204 mujeres, 525 hombres), y las características de voz y craneofaciales se evaluaron mediante modelos supervisados de aprendizaje automático. Por otra parte, existen diferencias sobre como el SAHS afecta a mujeres y hombres, como por ejemplo los síntomas y factores de riesgo, los cuales pueden actuar como variables de confusión en el modelo para la detección del SAHS. Por tanto, es importante resaltar que los experimentos se realizaron para cada género y por separado. Además, los primeros estudios para la detección del SAHS mediante el habla alcanzaron resultados favorables, sin embargo, después de un análisis de los mismos y de la metodología seguida, se encontraron muchas limitaciones, siendo algunas de estas: pocos datos de entrenamiento y el manejo incorrecto de los modelos de aprendizaje automático, provocando la aparición de falsos resultados. Por tanto, la principal motivación que conduce al desarrollo de esta Tesis es la exploración de las técnicas de procesado automático del habla y caracterización automática de las caras, así como la evaluación de estas características mediante un modelo de validación exhaustiva con el objetivo de hacer frente a las limitaciones presentes en nuestra base de datos y evitar los típicos errores debido al manejo incorrecto de los modelos de aprendizaje automático. Por último, cabe destacar, de acuerdo a nuestro mejor conocimiento, que la presente Tesis es el único estudio que aborda la caracterización del fenotipo craneofacial y del habla en mujeres mediante el uso de procesamiento automático del habla y técnicas de caracterización facial. ----------ABSTRACT---------- This Thesis explores t he speech and craniofacial phenotype characterization in Obstructive Sleep Apnea (OSA) patients by using the state-of-the-art speaker’s voice characterization technologies and image processing techniques for face recognition along with the study and analysis of supervised machine learning methods for evaluating these speech and craniofacial features as predictors of OSA severity. The OSA is a common sleep-related breathing disorder affecting mainly men. I t is characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The diagnosis of OSA is carried out at a sleep unit in a hospital by the polysomnography (PSG) test. This test requires an overnight stay of the patient at the sleep unit under the supervision of a clinician to monitor breathing patterns, heart rhythm, and limb movements, resulting in an invasive and costly method as well as the waiting list may exceed one year. As an alternative to this test, many diagnosis schemes have been developed to help to reduce the waiting lists and accelerate the detection of severe cases such as questionnaires for OSA screening, and those based on medical-imaging, for instance oropharyngeal visual inspection (i.e. Mallampati test), and craniofacial assessment by means analysis techniques (e.g. cephalometry) of images created by advanced methods for visual representations (e.g. computed tomography, magnetic resonance imaging). Although these methods can help to increase the detection of positive cases as well as provide reliable results, most of them lack generalization such as questionnaires as well as they are costly and invasive for patients such as those used for craniofacial assessment. Early studies for OSA assessment by using medical-imaging techniques and anthropometric characterization found out some evidence of abnormalities in upper airway structures in OSA subjects. Consequently, abnormal or particular speech features in OSA speakers may be expected from the altered structure or altered function of their upper airways. These facts have led to proposing less costly procedures based on the analysis of patients’ facial images and voice recordings to help with OSA detection and severity assessment. Therev fore, this Thesis explores the speech and craniofacial characterization in Obstructive Sleep Apnea (OSA) patients by means of speech and craniofacial features based on automatic speaker recognition systems and face characterization techniques respectively: 1) supervectors and i-vectors, and 2) local features, statistical-model based features, and deep-learning-based features. Using an existing database of 729 patients (204 women, 525 men), speech and craniofacial features were evaluated for OSA prediction by means supervised machine learning models. There are differences in how OSA affects men and women such as symptoms and risk factors, which could act as confounding factors. Therefore, it is important to emphasize that experiments were performed separately for each gender. Furthermore, previous speech-based OSA detection studies have reached successful results, however, after a review of their results and methodologies, we found out several limitations, those being related to a small number of training samples as well as machine learning pitfalls in the methodology and validation scheme such as feature selection over a limited number of samples and high-dimensionality features resulting in a high probability of overfitting of the prediction model. The ultimate motivation of this Thesis consists in exploring automatic speech processing and facial characterization techniques for OSA assessment on patients as well as their evaluation by means of an exhaustive validation scheme in order to face the limitations related to database size and to avoid the machine learning pitfalls due to the incorrect treatment of supervised learning models. Finally, to the best of our knowledge, the present Thesis is the unique study that approaches the speech and craniofacial phenotype characterization in women by using automatic speech processing and facial characterization techniques

    Análisis de parámetros nasales en unidades lingüísticas para el reconocimiento de locutor

    Full text link
    En los últimos años los sistemas automáticos de reconocimiento de locutor independiente de texto a nivel acústico/espectral, tales como sistemas GMM-UBM, i-vector y PLDA, han presentado un alto rendimiento en la identificación de locutor, como también una alta robustez frente a la variabilidad de canal y locutor. Asimismo, los sistemas de alto nivel (nivel fonético) han demostrado una alta discriminación y un alto poder de fusión con sistemas espectrales a corto plazo. Una parte del reconocimiento perceptivo en humanos se enfoca en el análisis de las particularidades acústicas del hablante que puede presentar en el habla, por ejemplo: modo de articulación fonética de una determinada unidad. Adicionalmente, en presencia de tipo de vocales y consonantes, la identificación del hablante aumenta en precisión que en comparación de emplear otros sonidos para el reconocimiento. Por ejemplo, la precisión del reconocimiento perceptivo en inglés aumenta en presencia de consonantes líquidas, como también la presencia de consonantes nasales. Estos últimos son los que menor intra-variabilidad y mayor inter-variabilidad presentan en el reconocimiento perceptivo de hablantes. En base a las características que introducen las consonantes nasales en el reconocimiento perceptivo, existen indicios sobre las posibles individualidades que puede generar la nasalización en el habla de un determinado locutor. Por ello, se han desarrollado estudios en la comunidad científica basados en la medida y análisis de la correlación acústica y perceptiva que se relacionan con la nasalización. Estas medidas y correlaciones se recogen mediante un conjunto de parámetros acústicos, también llamados parámetros nasales, cuya función es medir el nivel de nasalización en el habla. En base al conjunto de parámetros acústicos, se han abierto muchos frentes, siendo los más importantes como referencia para el presente Trabajo Fin de Máster: detección de nasalización de vocales, medidas de nasalización para selección de unidades fonéticas y predicción del rendimiento en sistemas automáticos de reconocimiento de locutor, y detección de anomalías en base a la calidad de la voz. Por tanto, el presente Trabajo Fin de Máster se enfoca en la exploración de parámetros acústicos en unidades fonéticas con el objetivo de analizar y estudiar el nivel de discriminación, en función de la intra-variabilidad e inter-variabilidad, que podría aportar al hablante frente al resto de una población.In recent years, automatic speaker recognition systems based on GMM-UBM, i-vectors or PLDA have demostrated good performance for text-independent speaker recognition and high robustness on speaker and session variability. Sumilarly, the usage of linguistic units (high level features) on speaker recognition has demonstrated desired properties as high discrimination and large power on fusing with short-time spectral systems. However, these methods used on automatic speaker recognition differ from human speaker indentification since the latter use perceptual speaker indentification (PSI). Perceptual speaker identification is related to the speaker individualities present on speech. The accuracy of perceptual speaker identification performance depends on what types of sounds are presented to the listeners. For example, the listeners can identify speakers more accurately when vowels and voiced consonants are presented to them, specifically, the availabillity of liquids in English and nasal consonants. The latter was consistently effective for PSI due to greater inter-variability and smaller intra-variability between speakers. In this sense, there are cues of speaker individualities due to the nasalization. Therefore, several researchers have found a number of acoustical and perceptual correlates of nasality. Hence, a set of acoustic parameters (AP) has been proposed to capture the acoustic correlate and to measure the degree of nasalization. Therefore, many front-end have been developed based on the analysis of AP, being the most important references of this Master's Thesis: automatic detection of vowel nasalization, nasality measures for speaker recognition data selection and performance prediction and clinical assesment of nasal speech quality. Therefore, this Master's Thesis aims to explore the set of acoustic parameters in phonetic units and to analyse their levels of discriminability, regarding inter-variability and intra-variability, that might contribute to the improvement of speaker identification

    A Comparison of Hybrid and End-to-End ASR Systems for the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge

    No full text
    This paper describes a comparison between hybrid and end-to-end Automatic Speech Recognition (ASR) systems, which were evaluated on the IberSpeech-RTVE 2020 Speech-to-Text Transcription Challenge. Deep Neural Networks (DNNs) are becoming the most promising technology for ASR at present. In the last few years, traditional hybrid models have been evaluated and compared to other end-to-end ASR systems in terms of accuracy and efficiency. We contribute two different approaches: a hybrid ASR system based on a DNN-HMM and two state-of-the-art end-to-end ASR systems, based on Lattice-Free Maximum Mutual Information (LF-MMI). To address the high difficulty in the speech-to-text transcription of recordings with different speaking styles and acoustic conditions from TV studios to live recordings, data augmentation and Domain Adversarial Training (DAT) techniques were studied. Multi-condition data augmentation applied to our hybrid DNN-HMM demonstrated WER improvements in noisy scenarios (about 10% relatively). In contrast, the results obtained using an end-to-end PyChain-based ASR system were far from our expectations. Nevertheless, we found that when including DAT techniques, a relative WER improvement of 2.87% was obtained as compared to the PyChain-based system

    Automatic Prosodic Analysis to Identify Mild Dementia

    No full text
    This paper describes an exploratory technique to identify mild dementia by assessing the degree of speech deficits. A total of twenty participants were used for this experiment, ten patients with a diagnosis of mild dementia and ten participants like healthy control. The audio session for each subject was recorded following a methodology developed for the present study. Prosodic features in patients with mild dementia and healthy elderly controls were measured using automatic prosodic analysis on a reading task. A novel method was carried out to gather twelve prosodic features over speech samples. The best classification rate achieved was of 85% accuracy using four prosodic features. The results attained show that the proposed computational speech analysis offers a viable alternative for automatic identification of dementia features in elderly adults

    Fundamentos para el cálculo - MA384 201801

    No full text
    Descripción: El curso de Fundamentos para el Cálculo es un curso teórico - práctico, dictado en modalidad Blended, dirigido a los estudiantes de Administración del primer ciclo y que trabaja en las primeras unidades los temas de ecuaciones, inecuaciones y gráficas en el plano, para poder acometer el estudio de las funciones en la unidad 3, y usarlas para resolver problemas de aplicación con contexto real. Las clases se imparten en 3 sesiones semanales, las dos primeras son presenciales y la tercera es online. Propósito: El curso está diseñado para desarrollar en los estudiantes la competencia de Razonamiento Cuantitativo, a nivel 1, por medio del estudio de situaciones problemáticas a las que, de ahora en adelante nos referiremos como casos, cuyo dominio les ayudará a desenvolverse con éxito en situaciones que involucran el pensamiento matemático para la toma de decisiones
    corecore